-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clear tmp dir #344
Clear tmp dir #344
Conversation
// wipe tmp dir and log info | ||
try { | ||
// Run command | ||
String[] command = {"find", "/tmp", "-ctime", "+" + TMP_DIR_AGE_LIMIT_DAYS, "-exec", "rm", "-rf", "{}", "+"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, I suggest investigating a java-based solution instead of using a subprocess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
423a428
to
673d8c1
Compare
|
||
@Scheduled(cron = "${cron.clearTmpDir}") | ||
public void clearTmpDir() { | ||
// TODO: find source of tmp dir creation and delete from there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 we can change the PR to draft until the TODO
s are resolved.
// wipe tmp dir and log info | ||
try { | ||
// Run command | ||
String[] command = {"find", "/tmp", "-ctime", "+" + TMP_DIR_AGE_LIMIT_DAYS, "-exec", "rm", "-rf", "{}", "+"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
if (exitCode == 0) { | ||
inputGobbler.join(); | ||
logger.info(inputGobbler.getContent()); | ||
logger.info("Successfully Cleared /tmp directory"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Cleared/cleared for consistency with other messages? 😬
Also, once the TODO
above about locating the tmp dir is solved, we should probably include the location in the logs to help in case a sysadmin needs to check the directory permissions, modes, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Cleared/cleared for consistency with other messages? 😬
Also, once the
TODO
above about locating the tmp dir is solved, we should probably include the location in the logs to help in case a sysadmin needs to check the directory permissions, modes, etc.
Hi @kinow ,
The idea was to delete the file within the process that actually creates these files.
But that would be a mistake since the file may still be in use.
Another idea was to delete the file the moment it is no longer in use, but an exact time would be difficult to determine
Hence a solution that works would be to delete all files above a certain age.
so I would be removing the TODO
remark.
But before I do that, do you have any comments on a much better approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. The other comment about using a Java native solution is still valid I think.
So instead of forking a new process we could try first using Java to iterate/walk the temp dir, filter for files/dirs older than the threshold, and finally delete or mark them for deletion (I assume we will try to delete it and in case it fails, call deleteOnExit
or just log it... maybe Commons IO has something that we can use... just food for thought 👍 )
Moves the changes in that refactor the scheduler to this branch.
@@ -48,6 +48,7 @@ spring.data.mongodb.port = 27017 | |||
sparql.endpoint = http://localhost:3030/cwlviewer/ | |||
|
|||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit-picking, but unnecessary blank line 😬
|
||
String[] command = {"find", directoryPath, "-ctime", "+" + days, "-exec", "rm", "-rf", "{}", "+"}; | ||
ProcessBuilder clearProcess = new ProcessBuilder(command); | ||
Process process = clearProcess.start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was in Scheduler.java
before in this PR, and then was moved here, so it's not showing @tetron 's review comment. The comment is still visible in the GitHub UI, but will copy it here just in case that disappears:
As discussed, I suggest investigating a java-based solution instead of using a subprocess.
|
||
} | ||
|
||
private void deleteWithinDirectoryCMD(String directoryPath, int days) throws IOException, InterruptedException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably CMD
-> Cmd
or Command
? I don't have a strong preference for CMD or Cmd, but given we have classes like CwlViewerApplication
and not CWLViewerApplication
, I take it the project is using the convention of keeping only the first letter of an abbreviation upper-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I searched this file in GitHub UI for where it was used... but apparently it's not used anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Got it! We are now using deleteWithinDirectory
. IMO we probably don't need/want this method (and the one above if not used elsewhere) now that we have deleteWithinDirectory
. So probably delete deleteWithinDirectoryCMD
and other private functions that are used only in deleteWithinDirectoryCMD
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will do that. Left them both for the review.
if (files != null) { | ||
for (File subfile : files) { | ||
if (subfile.isDirectory()) { | ||
deleteWithinDirectory(subfile, days); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong indentation.
subfile.delete(); | ||
logger.info("Deleted directory " + subfile.getPath()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if
/for
/if`... Could possibly be rewritten in Java 8's streams and filters... but not a blocker, can be done in a follow-up.
|
||
public FileUtils(Logger logger) { | ||
this.logger = logger; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary? ☝️
Normally utility classes have private constructors and static public methods. But since this is not using static methods, I think we should delete this constructor, and create a logger here too, as it's done in other files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggled a bit with this decision because
- I wanted the ability to log the individual steps in a way that should show the scheduled process
- I didn't want the schedule function to deal with exceptions.
but looking at your comment. I guess I could have different logs at the scheduled function level and also the file utils level. That way the util logs would be nested and surrounded by the scheduled functions logs
|
||
public class FileUtils { | ||
|
||
private final Logger logger; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private final Logger logger; | |
private final Logger logger = LoggerFactory.getLogger(this.getClass()); |
public void clearTmpDir() { | ||
// wipe tmp dir and log info | ||
FileUtils utils = new FileUtils(logger); | ||
utils.clearDirectory("/tmp", TMP_DIR_AGE_LIMIT_DAYS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I run cwlviewer
in my laptop, directly from IntelliJ, executing the SpringBoot application. If I understood correctly, here if I were to leave the process running until midnight, then it would go over my /tmp
directory removing any files older than the age limit.
I have other applications running that also use /tmp
to store temporary files. That would probably cause issues for me. So in this case, I would probably store temporary data for cwlviewer
elsewhere, and expect the Scheduler
to be able to cope with that too.
So I think we should:
- confirm
cwlviewer
is using/tmp
(not sure if anyone uses Windows withcwlviewer
, but someone could be using a different TMPDIR in linux too, normally in Java we rely onjava.io.tmpdir
) - if we are using
/tmp
, or something that resolves/tmp
, then we need to make sure it doesn't affect other processes running too, so we either- delete only cwlviewer files
- move the temporary directory to another configurable location
… and Commons IO to traverse and delete old files and directories.
… and Commons IO to traverse and delete old files and directories.
… and Commons IO to traverse and delete old files and directories.
… and Commons IO to traverse and delete old files and directories.
…rse and delete old files and directories.
Description
This creates a scheduled process to clear the /tmp dir of files older than a day every single day
fixes #200 #279
Motivation and Context
Previous to this the issue of large files(pull workflow repositories) being stuck in the /tmp dir caused issues on the server and were cleared manually or with a system cron job. This PR moves that process within the CWL viewer application.
How Has This Been Tested?
Running the application and checking that the /tmp files get cleared as scheduled
Screenshots (if appropriate):
Types of changes
Checklist: